Multi-pass data organization and automatic naming

ABSTRACT

A method and a system to organize a data set into groups of data subsets in multiple passes using different parameters and to automatically name the groups is disclosed. For example, a data set is retrieved in accordance with a search query submitted by a user. The data set is organized into clusters based on a statistic(s) of the data set. The data set is then organized into groups of data subsets based on an attribute(s) indicated by the data set. Each of the groups are automatically named based on a property shared by data units of the group. The name(s) of a group may be mined from the data units of the group, retrieved from a structure that maps to attribute values indicated by the data units of the group, etc.

This application is a continuation of U.S. patent application Ser. No.12/771,958, filed Apr. 30, 2010, now U.S. Pat. No. 7,933,877, which is acontinuation of U.S. patent application Ser. No. 11/646,905, filed Dec.28, 2006, now U.S. Pat. No. 7,739,247, entitled “Multi-Pass DataOrganization and Automatic Naming,” both of which are incorporatedherein by reference in their entirety.

TECHNICAL FIELD

The present application relates generally to the technical field ofstatistical data analysis and, in one specific example, to organizingdata into groups with multiple passes of different organizationtechniques, and automatically naming the groups.

BACKGROUND

When searching for products or services online, a user is oftenpresented with a large number of results. In an attempt to avoidoverwhelming a user, data is generally organized in a manner that,hopefully, enhances the user experience. For example, the data isgenerally organized in manner that allows a user to quickly glean usefulinformation from the presented search-query results.

Some example challenges that may exist with current techniques includethe presentation of data in a meaningful manner and the expense ofpreparing data for the presentation. For example, data may be presentedbased on statistics that are user driven. However, presenting datasolely based on user driven statistics may not be meaningful since userbehavior is not constrained by a particular attribute of the product orservice of interest. For instance, a user searching for a book aboutautomotive repair may be presented with recommended books about cooking.Although user statistics may indicate that users often purchase thesetwo types of books, there is no objective attribute-based reason forrecommending a cooking book to a user looking at an automotive book. Itmay be that many user accounts are shared by married couples withdiverse interests. However, such an underlying cause for dataassociation is imperceptible to the user. To address this possiblechallenge, the book data may be tagged. However, determining tags andthen tagging a vast database of data may require expenditure ofsignificant resources.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments are illustrated by way of example and not limitation inthe figures of the accompanying drawings in which:

FIG. 1 is a graphical depiction of an example of a system that presentsa data set in accordance with multi-pass organizing.

FIG. 2 is a graphical representation of an example of a process fororganizing a data set.

FIG. 3 is a flowchart that depicts an example of a process forautomatically organizing a data set into groups and automatically namingthe groups.

FIG. 4 is a flowchart that depicts an example of a process fordetermining a shared property.

FIG. 5 is a conceptual depiction of an example of a name structure.

FIG. 6 is a flowchart that depicts an example process for retrieving aname from a name structure.

FIG. 7 is a network diagram depicting a client-server system 700, withinwhich one example embodiment may be deployed.

FIG. 8 is a block diagram illustrating multiple applications 720 and 722that, in one example embodiment, are provided as part of the networkedsystem 702.

FIG. 9 is a high-level entity-relationship diagram, illustrating varioustables 900 that may be maintained within the databases 726, and that areutilized by and support the applications 720 and 722.

FIG. 10 provides further details regarding an attribute tables that areshown in FIG. 9 to be maintained within the databases 726.

FIG. 11 shows a diagrammatic representation of machine in the exampleform of a computer system 1100 within which a set of instructions, forcausing the machine to perform any one or more of the methodologiesdiscussed herein, may be executed.

DETAILED DESCRIPTION

Example methods and systems to organize a data set into groups withmultiple passes based on different parameters and to automatically nameeach of the groups are described. In the following description, forpurposes of explanation, numerous specific details are set forth inorder to provide a thorough understanding of example embodiments. Itwill be evident, however, to one skilled in the art that the presentinvention may be practiced without these specific details. For instance,examples are described with reference to clustering techniques, butdetails of clustering techniques are not exhaustively described,particularly in light of the plethora of clustering techniques that canbe utilized within different classes of clustering, such as hierarchicalclustering and partitional clustering. In another instance, details oforganizing are not exhaustively described since one of ordinary skill inthe art will appreciate that embodiments may employ any of a number oftechniques, such as, but not limited to: writing data units of differentgroups into different locations in physical or virtual memory; writingdata units of different groups into different memory or storage devices,rearranging data units, automatically tagging data units to indicate theorganization, updating data with pointers, both writing data to alocation and tagging the data, etc.

Automating organization of data allows the expense of manual tagging tobe avoided. In one implementation, a data set is organized into groupswith two passes using different parameters (e.g., a statistic and anattribute value), and then each group is automatically named. Statisticsextrapolated from a data set and/or statistics collected in advance willdrive organizing of the data set into data subsets or clusters. The dataset is then organized with respect to one or more attributes. Althoughthe second organizing pass is separate from the first organizing pass,the second organizing pass is with respect to the subsets or clustersresulting from the first pass. The second pass may or may not partitiondata units of one or more clusters created by the first organizing pass.The second organizing pass results in various groups of data units fromthe data set. Each of the groups is then automatically named.

For example, one may consider the hypothetical example of a data setwherein each data unit in the data set represents a respective type ofdigital camera. The data units are first organized based on a statistic,such as number of shared attributes. The first organizing pass maycreate clusters that mostly are divided between the categories ofprofessional cameras and convenience cameras, as an example, based onthe statistic(s) of shared attributes. In that example, each data unitindicates weight of the respective type of digital camera associatedwith the data unit. An example attribute value for the second organizingpass may be a weight of 0.5 pounds. The second organizing pass thendistinguishes between those data units that indicate a weight greaterthan 0.5 pounds and those data units that indicate a weight equal to orless than 0.5 pounds. Indicated weights greater than 0.5 pounds map to aname “heavy” and indicated weights less than or equal to 0.5 pounds mapto a name “light.” Each group of data units that results from themultiple organizing passes will satisfy the statistical organizing andthe structural organizing (or attribute driven organizing). Hence,statistical organizing followed by attribute driven organizing allowsfor quality clustering of data units that can be presented to a user ina meaningful manner. In this example, the presentation is meaningful toa user since a user can determine that a professional camera isgenerally “heavy,” and inclusion of a convenience camera in the group isdue to the convenience camera also being “heavy.” Since the drivers ofthe organizing are based on the data already available or automaticallygenerated, the expense of tagging data can be avoided, although theorganizing can be augmented with tagging. This example is provided todepict a particular grouping of data that may seem meaningless orimperceptible without the naming. As another example, data units in afirst group named “professional” may be grouped together because themember data units indicate number of pixels that exceed a pre-definedthreshold for those types of digital cameras associated with the dataunits in the first group.

FIG. 1 is a graphical depiction of an example of a system that presentsa data set in accordance with multi-pass organizing. A client machine101 transmits a query to a server 103 via a network 102 (e.g., one ormore of the Internet, a LAN, a WAN, etc.). Server 103 is operationallyconnected to one or more database servers 109 and 115. In oneimplementation, the server 103 includes the navigation module 105. Thenavigation module 105 handles the query from the client machine 101 andsends the query to the database server 109. The navigation module 105allows a user to search, navigate, and/or browse data. The databaseserver 109 accesses the database 111 to retrieve a data set inaccordance with the query. The database server 109 sends the data setretrieved from the database 111 to the navigation module 105. Thenavigation module 105 sends the data set to a organizing module 107. Theorganizing module 107 queries a database server 115 for statistic(s)relevant to the data set. Although the database servers 115 and 109, aswell as the databases 111 and 117, are depicted as separate, the servers109 and 115 and/or the databases 111 and 117 may not be separate. Thedatabase server 115 accesses a database 117 and retrieves the statistic.The database server 115 sends the retrieved statistic(s) to theorganizing module 107. The organizing module 107 organizes the data setbased on the statistic(s) into clusters or data subsets. The organizingmodule 107 then organizes the data set based on an attribute(s) of thedata set into groups. The organizing module 107 automatically names theresulting groups. The organizing module 107 sends the resulting groupsto the navigation module 105. The navigation module 105 presents thegroups with names to the client machine 101 via the network 102.Presenting the data set in accordance with the groupings and with thenames allows a user at the client machine 101 to perceive the underlyingbasis for the particular organization of the data set. Of course, it isnot necessary to indicate the names to the user. The names may beutilized for a variety of purposes and/or in a variety of ways (e.g.,utilized to maintain the data; not revealed to a user unless responsiveto a request; present or reveal less than all of the names; etc.).

It should be understood that the system depicted in FIG. 1 is anexample. Although depicted as singular, the any one or more of theservers 103, 109 and 115 may be implemented as multiple servers. Inaddition, the functionality for organizing a data set in multiple passescan be implemented in accordance with a variety of techniques. Forinstance, the navigation module 105 and the organizing module 107 may beimplemented as a single module, such as an information guide (e.g., aweb-based guide, an application for mobile devices, etc.). In anotherexample, a separate module or modules may implement the automatic namingfunctionality and/or the statistic generation functionality (e.g.,statistic extrapolation module). For example, groups resulting from anorganized data set may be sent to a module that examines the groups toautomatically name the groups, that accesses a structure or database tolook up names based on a shared property, etc. Moreover, the modules maybe wholly or partially implemented in hardware (e.g., an applicationspecific integrated circuit may perform the organizing functionality,names may be hosted in a fast look-up table, etc). Furthermore, themodules may implemented across multiple servers.

FIG. 2 is a graphical representation of an example of a process fororganizing a data set. A data set 202 includes data units 201 a-201 l.The data units 201 a-201 l represent an item that may be abstract orconcrete, such as a product, service, real estate, stock, etc. Each ofthe data units 201 a-201 l indicates a value(s) for one or moreattributes of the represented item. With a first organizing pass drivenby a statistic(s) about the data set 202, the data set 202 is organizedinto clusters 203 a-203 d of similar data units. The cluster 203 aincludes the data units 201 a-201 b, 201 g, 201 j, and 201 l. Thecluster 203 b includes the data units 201 c and 201 k. The cluster 203 cincludes the data units 201 e, 201 h, and 201 i. The cluster 203 dincludes the data units 201 d and 201 f. Similarity can be driven inaccordance with a number of techniques (or combination of techniques)such as number of attributes in common, amount of text shared among itemdescriptions, a taxonomy, statistical co-occurrence of items, behavioralevidence (visitors browsing one item and then another, considered asevidence of similarity), etc. Similarity does not require every dataunit in a cluster to have identical or nearly identical values. Forexample, similarity may depend on the utilized clustering technique(s)(e.g., k-means clustering, fuzzy clustering, agglomerative clustering,etc.).

A second organizing pass driven by one or more attributes organizes thedata set 202 into groups 205 a-205 e. The groups 205 b-205 d include thesame data units as the clusters 203 b-203 d, respectively. The secondorganizing pass has partitioned the cluster 203 a into groups 205 a and205 e. After the second organizing pass, the groups are automaticallynamed. However, it is not necessary to only perform two passes. Thegroups may be refined, cluster quality improved, etc., with additionalpasses over the data set before and/or after the first and secondorganizing passes. For example, a clustering technique may be applied tothe groups resulting from the second organizing pass with a different orthe same statistic as that employed for the first organizing pass.Alternatively, the data set may be organized again with respect todifferent attributes.

In addition, several possible cluster-quality metrics are possible. Twoparticular examples of cluster-quality metrics include the ratio ofcross-cluster similarity to global similarity (i.e., items fromdifferent clusters are less similar than items from the same cluster ingeneral) and spectral clustering. In addition to cluster quality, thequality of both the description and the name of an item can also bemeasured. Moreover, the description and name can be used as a criteriaor part of the criteria for determining which of the results ofclustering to accept.

FIG. 3 is a flowchart that depicts an example of a process forautomatically organizing a data set into groups and automatically namingthe groups. At block 301, a data set that includes a plurality of dataunits is received. At block 303, statistical clustering is applied tothe data set to organize the plurality of data units into clusters basedon one or more statistic of the plurality of data units. Virtually anystatistic associated with the various data units may be used, including,for example, user selection or purchasing behavior, attributes sharedamong the plurality of data units, textual data shared among the dataunits, etc. A clustering technique may search over all possible smalldecision trees or short formulas to find a clustering that has highquality, a good name, and a good cluster description. The clusterdescription is essentially a query that reproduces the cluster (from theset of data units under consideration). For example, a clusterdescription for digital cameras may be a search-query like “SLRs costingmore than $2000.”

At block 305, the plurality of data units is organized into groups basedon one or more attributes indicated by the plurality of data units.Attributes can vary greatly in relation to the item being represented bythe data units. For example, the data units may represent digitalcameras. Examples of digital camera attributes include number of pixels,screen size, zoom, weight, price range, price, power source, etc.Examples of attributes for land items include acreage, lot size,terrain, available water source, rural or urban, improved, availableutilities, etc. Examples of attributes for service items include levelof experience, schedule availability, recommendation, geographicproximity, certifications, etc. One of ordinary skill in the art shouldappreciate the broad spectrum of items that can be represented and widearray of corresponding attributes that can be utilized for organizing.

After block 305, blocks 307 and 309 are performed for each group of dataunits resulting from the second organizing pass. For example, a sharedproperty is based on an attribute. Attribute values may be mapped to aname (e.g., phrase or label associated with an attribute value) invarious implementations. In one example, an implementation may allowmultiple names per attribute value and the same names may be used bydifferent attributes and different attribute values. For example, thename “heavy” may be mapped to attribute values greater than or equal to12 ounces, while the name “light” is mapped to attribute values below 12ounces. In addition, the name “professional” may be mapped to weightvalues that are greater than 1 pound and mapped to pixel values that aregreater than 5 mega-pixels.

At block 307, a property (or properties) that is shared by at least amajority of the data units of the group is determined. At block 309, thegroup is named based on the determined shared property. The group nameis chosen to be succinct, but not necessarily accurate. For one example,it may be determined that at least a majority of data units of a groupincludes the term “farm,” thus the group is named “farm.” However, atleast one of the data units indicates a hobby farm while the majority ofdata units indicate industrial farm. In another example, data units of adata set may represent computers and the data units of a particulargroup indicate video cards with at least 1 gigabyte of memory andcooling systems. The group is named “gaming computers,” because suchattributes are often associated with computers configured to supportdemands of computer games. After the groups are named, the data set andthe names are supplied at block 311 (e.g., supplied for transmission,supplied for assembly into a web page, supplied for display, etc). Inone example, naming finds an assignment of groups to names such that foreach group, most of the represented items have an attribute value thatmaps to the assigned name and most items not in the given group do nothave an attribute value that maps to the name. Hence, in one example, agroup of data units is presented to a user by assigned name, followed bythe description. Using the previous example, a group may be presentedwith the name “professional” followed by the description’ “SLRs costingmore than $2000.” Note that “professional” by itself may not be usefulto the user (it finds the group through the mapping of theattribute-value to the name), but shown together with the description itlooks as if “SLRs costing more than $2000” is an operational definitionof “professional.” It also appears that “professional” is a convenientshorthand for “SLRs costing more than $2000.” The name and thedescription both are individually incomplete, yet informative whenpresented together.

FIG. 4 is a flowchart that depicts an example of a process fordetermining a shared property. At block 401, data units of a group aremined for one or more values common across a group. The common valuesmay be text, a symbol, a metric for an attribute, etc. For example, eachdata unit may include a title field for a represented item. In thisexample, the title field of each data unit is scanned for a token (e.g.,word, image, identifier, character, hyperlink, etc.) that occurs in allof the data units of a group. At block 403, it is determined whether acommon value is found. If a common value is found, then control flows toblock 405. If a common value is not found, then control flows to block407.

At block 405, the common value is indicated for naming.

At block 407, the data units are examined for related values and/orassociations across the group. For instance, identical terms may not befound in every title field, but related terms are found. For example,some data units of a group include “premium audio system” in a fieldwhile other data units of the group include “premium sound system” inthe field. Data units with these different values may still be groupedtogether by associations, for example, determined by consulting astructure that indicates similar terms, synonyms, etc. At block 409, therelated value(s) is indicated for naming. For example, values from thedata units are used to access another structure to find a value(s) thatassociates with all data units, which is then used for naming.

The naming of the groups may be based directly on the determined sharedproperty or based on an association that exists between the sharedproperty or properties and a name. For example, names may be retrievedfrom a structure that hosts names indexed by an attribute or attributevalue.

FIG. 5 is a conceptual depiction of an example of a name structure. InFIG. 5, a name structure 500 is represented with three columns. A firstcolumn 501 indicates attribute based indices of the name structure(e.g., hashes of attribute labels, hashes of attribute tokens, attributelabel, land area, class of professional that provides a service, levelof difficulty, etc.). A second column 503 indicates conditionals foreach entry. A third column 505 indicates two names for each entry.

FIG. 6 is a flowchart that depicts an example process for retrieving aname from a name structure. At block 601, a name structure is accessedwith an index that corresponds to an attribute of a group of data unitsand an entry is selected. For example, the attribute is “printed” for agroup of data units that represent various shirts. At block 603, aproperty shared by the data units of the group is evaluated against aconditional of the selected entry. For example, the index references apiece of code with a conditional statement that branches to one of twopointers to access the corresponding memory location that hosts a name.For instance, the data units of the group indicate that the representedshirts are printed shirts, and the conditional statement determines ifthe “printed” field of each data unit (or an associated field orsubfield) indicates a particular shared property of the printedattribute. For example, data units may be evaluated to determine whetherthe shirts are printed with a comic book character. At block 605, a nameis retrieved from the selected entry based on the evaluated conditional.Referring again to the example of represented printed shirts, if thegroup of data units indicates that the shirts are printed with a comicbook character, then a name is retrieved accordingly. Those of ordinaryskill in the an should appreciate that names, values, etc. may be storedin any of a variety of data structures (e.g., arrays, binary searchtrees, hash tables, linked lists, hybrid data structures, etc.) and/orin any of a variety of hardware (e.g., cache, storage, random accessmemory, portable flash memory, lookup tables, content addressablememory, etc.).

FIG. 7 is a network diagram depicting client-server system 700, withinwhich one example embodiment may be deployed. A networked system 702, inthe example forms of a network-based marketplace or publication system,provides server-side functionality, via a network 704 (e.g., theInternet or Wide Area Network (WAN)) to one or more clients. FIG. 7illustrates, for example, a web client 706 (e.g., a browser, such as theInternet Explorer browser developed by Microsoft Corporation of Redmond,Wash.), and a programmatic client 708 executing on respective clientmachines 710 and 712.

An Application Program interface (API) server 714 and a web server 716are coupled to, and provide programmatic and web interfaces respectivelyto, one or more application servers 718. The application servers 718host one or more marketplace applications 720 and payment applications722. The application servers 718 are, in turn, shown to be coupled toone or more databases servers 724 that facilitate access to one or moredatabases 726.

The marketplace applications 720 may provide a number of marketplacefunctions and services to users that access the networked system 702.The payment applications 722 may likewise provide a number of paymentservices and functions to users. The payment applications 722 may allowusers to accumulate value (e.g., in a commercial currency, such as theU.S. dollar, or a proprietary currency, such as “points”) in accounts,and then later to redeem the accumulated value for products (e.g., goodsor services) that are made available via the marketplace applications720. While the marketplace and payment applications 720 and 722 areshown in FIG. 7 to both form part of the networked system 702, it willbe appreciated that, in alternative embodiments, the paymentapplications 722 may form part of a payment service that is separate anddistinct from the networked system 702.

Further, while the system 700 shown in FIG. 7 employs a client-serverarchitecture, the present invention is of course not limited to such anarchitecture, and could equally well find application in a distributed,or peer-to-peer, architecture system, for example. The variousmarketplace and payment applications 720 and 722 could also beimplemented as standalone software programs, which do not necessarilyhave networking capabilities.

The web client 706 accesses the various marketplace and paymentapplications 720 and 722 via the web interface supported by the webserver 716. Similarly, the programmatic client 708 accesses the variousservices and functions provided by the marketplace and paymentapplications 720 and 722 via the programmatic interface provided by theAPI server 714. The programmatic client 708 may, for example, be aseller application (e.g., the TurboLister application developed by eBayInc., of San Jose, Calif.) to enable sellers to author and managelistings on the networked system 702 in an off-line manner, and toperform batch-mode communications between the programmatic client 708and the networked system 702.

FIG. 7 also illustrates a third party application 728, executing on athird party server machine 730, as having programmatic access to thenetworked system 702 via the programmatic interface provided by the APIserver 714. For example, the third party application 728 may, utilizinginformation retrieved from the networked system 702, support one or morefeatures or functions on a website hosted by the third party. The thirdparty website may, for example, provide one or more promotional,marketplace or payment functions that are supported by the relevantapplications of the networked system 702.

Marketplace Applications

FIG. 8 is a block diagram illustrating multiple applications 720 and 722that, in one example embodiment, are provided as part of the networkedsystem 702. The applications 720 may be hosted on dedicated or sharedserver machines (not shown) that are communicatively coupled to enablecommunications between server machines. The applications themselves arecommunicatively coupled (e.g., via appropriate interfaces) to each otherand to various data sources, so as to allow information to be passedbetween the applications or so as to allow the applications to share andaccess common data. The applications may furthermore access server oneor more databases 726 via the database servers 728.

The networked system 702 may provide a number of publishing, listing andprice-setting mechanisms whereby a seller may list (or publishinformation concerning) goods or services for sale, a buyer can expressinterest in or indicate a desire to purchase such goods or services, anda price can be set for a transaction pertaining to the goods orservices. To this end, the marketplace applications 720 are shown toinclude at least one publication application 800 and one or more auctionapplications 802 which support auction-format listing and price settingmechanisms (e.g., English, Dutch, Vickrey, Chinese, Double, Reverseauctions etc.). The various auction applications 802 may also provide anumber of features in support of such auction-format listings, such as areserve price feature whereby a seller may specify a reserve price inconnection with a listing and a proxy-bidding feature whereby a biddermay invoke automated proxy bidding.

A number of fixed-price applications 804 support fixed-price listingformats e.g., the traditional classified advertisement-type listing or acatalogue listing) and buyout-type listings. Specifically, buyout-typelistings (e.g., including the Buy-It-Now (BIN) technology developed byeBay Inc., of San Jose, Calif.) may be offered in conjunction withauction-format listings, and allow a buyer to purchase goods orservices, which are also being offered for sale via an auction, for afixed-price that is typically higher than the starting price of theauction.

Store applications 806 allow a seller to group listings within a“virtual” store, which may be branded and otherwise personalized by andfor the seller. Such a virtual store may also offer promotions,incentives and features that are specific and personalized to a relevantseller.

Reputation applications 808 allow users that transact, utilizing thenetworked system 702, to establish, build and maintain reputations,which may be made available and published to potential trading partners.Consider that where, for example, the networked system 702 supportsperson-to-person trading, users may otherwise have no history or otherreference information whereby the trustworthiness and credibility ofpotential trading partners may be assessed. The reputation applications808 allow a user, for example through feedback provided by othertransaction partners, to establish a reputation within the networkedsystem 702 over time. Other potential trading partners may thenreference such a reputation for the purposes of assessing credibilityand trustworthiness.

Personalization applications 810 allow users of the networked system 702to personalize various aspects of their interactions with the networkedsystem 702. For example a user may, utilizing an appropriatepersonalization application 810, create a personalized reference page atwhich information regarding transactions to which the user is (or hasbeen) a party may be viewed. Further, a personalization application 810may enable a user to personalize listings and other aspects of theirinteractions with the networked system 702 and other parties.

The networked system 702 may support a number of marketplaces that arecustomized, for example, for specific geographic regions. A version ofthe networked system 702 may be customized for the United Kingdom,whereas another version of the networked system 702 may be customizedfor the United States. Each of these versions may operate as anindependent marketplace, or may be customized (or internationalized)presentations of a common underlying marketplace. The networked system702 may accordingly include a number of internationalizationapplications 812 that customize information (and/or the presentation ofinformation) by the networked system 702 according to predeterminedcriteria e.g., geographic, demographic or marketplace criteria). Forexample, the internationalization applications 812 may be used tosupport the customization of information for a number of regionalwebsites that are operated by the networked system 702 and that areaccessible via respective web servers 716.

Navigation of the networked system 702 may be facilitated by one or morenavigation applications 814. For example, a search application (as anexample of a navigation application) may enable key word searches oflistings published via the networked system 702. A browse applicationmay allow users to browse various category, catalogue, or inventory datastructures according to which listings may be classified within thenetworked system 702. Various other navigation applications may beprovided to supplement the search and browsing applications.

In order to make listings, available via the networked system 702, asvisually informing and attractive as possible, the marketplaceapplications 720 may include one or more imaging applications 816utilizing which users may upload images for inclusion within listings.An imaging application 816 also operates to incorporate images withinviewed listings. The imaging applications 816 may also support one ormore promotional features, such as image galleries that are presented topotential buyers. For example, sellers may pay an additional fee to havean image included within a gallery of images for promoted items.

Listing creation applications 818 allow sellers conveniently to authorlistings pertaining to goods or services that they wish to transact viathe networked system 702, and listing management applications 820 allowsellers to manage such listings. Specifically, where a particular sellerhas authored and/or published a large number of listings, the managementof such listings may present a challenge. The listing managementapplications 820 provide a number of features auto-relisting, inventorylevel monitors, etc.) to assist the seller in managing such listings.One or more post-listing management applications 822 also assist sellerswith a number of activities that typically occur post-listing. Forexample, upon completion of an auction facilitated by one or moreauction applications 802, a seller may wish to leave feedback regardinga particular buyer. To this end, a post-listing management application822 may provide an interface to one or more reputation applications 808,so as to allow the seller conveniently to provide feedback regardingmultiple buyers to the reputation applications 808.

Dispute resolution applications 824 provide mechanisms whereby disputesarising between transacting parties may be resolved. For example, thedispute resolution applications 824 may provide guided procedureswhereby the parties are guided through a number of steps in an attemptto settle a dispute. In the event that the dispute cannot be settled viathe guided procedures, the dispute may be escalated to a third partymediator or arbitrator.

A number of fraud prevention applications 826 implement fraud detectionand prevention mechanisms to reduce the occurrence of fraud within thenetworked system 702.

Messaging applications 828 are responsible for the generation anddelivery of messages to users of the networked system 702, such messagesfor example advising users regarding the status of listings at thenetworked system 702 providing “outbid” notices to bidders during anauction process or to provide promotional and merchandising informationto users). Respective messaging applications 828 may utilize any onehave a number of message delivery networks and platforms to delivermessages to users. For example, messaging applications 828 may deliverelectronic mail (e-mail), instant message (IM), Short Message Service(SMS), text, facsimile, or voice (e.g., Voice over IP (VoIP)) messagesvia the wired (e.g., the Internet), Plain Old Telephone Service (POTS),or wireless (e.g., mobile, cellular, WiFi, WiMAX) networks.

Merchandising applications 830 support various merchandising functionsthat are made available to sellers to enable sellers to increase salesvia the networked system 702. The merchandising applications 80 alsooperate the various merchandising features that may be invoked bysellers, and may monitor and track the success of merchandisingstrategies employed by sellers.

The networked system 702 itself, or one or more parties that transactvia the networked system 702, may operate loyalty programs that aresupported by one or more loyalty/promotions applications 832. Forexample, a buyer may earn loyalty or promotions points for eachtransaction established and/or concluded with a particular seller, andbe offered a reward for which accumulated loyalty points can beredeemed.

Data Structures

FIG. 9 is a high-level entity-relationship diagram, illustrating varioustables 900 that may be maintained within the databases 726, and that areutilized by and support the applications 720 and 722. A user table 902contains a record for each registered user of the networked system 702,and may include identifier, address and financial instrument informationpertaining to each such registered user. A user may operate as a seller,a buyer, or both, within the networked system 702. In one exampleembodiment, a buyer may be a user that has accumulated value (e.g.,commercial or proprietary currency), and is accordingly able to exchangethe accumulated value for items that are offered for sale by thenetworked system 702.

The tables 900 also include an items table 904 in which are maintaineditem records for goods and services that are available to be, or havebeen, transacted via the networked system 702. Each item record withinthe items table 904 may furthermore be linked to one or more userrecords within the user table 902, so as to associate a seller and oneor more actual or potential buyers with each item record.

A transaction table 906 contains a record for each transaction (e.g., apurchase or sale transaction) pertaining to items for which recordsexist within the items table 904.

An order table 908 is populated with order records, each order recordbeing associated with an order. Each order, in turn, may be with respectto one or more transactions for which records exist within thetransaction table 906.

Bid records within a bids table 910 each relate to a bid received at thenetworked system 702 in connection with an auction-format listingsupported by an auction application 802. A feedback table 912 isutilized by one or more reputation applications 808, in one exampleembodiment, to construct and maintain reputation information concerningusers. A history table 914 maintains a history of transactions to whicha user has been a party. One or more attributes tables 916 recordattribute information pertaining to items for which records exist withinthe items table 904. Considering only a single example of such anattribute, the attributes tables 916 may indicate a currency attributeassociated with a particular item, the currency attribute identifyingthe currency of a price for the relevant item as specified in by aseller.

FIG. 10 provides further details regarding an attribute tables that areshown in FIG. 9 to be maintained within the databases 726. A field 1002indicates a value for an attribute.

FIG. 11 shows a diagrammatic representation of machine in the exampleform of a computer system 1100 within which a set of instructions, forcausing the machine to perform any one or more of the methodologiesdiscussed herein, may be executed. In alternative embodiments, themachine operates as a standalone device or may be connected (e.g.,networked) to other machines. In a networked deployment, the machine mayoperate in the capacity of a server or a client machine in server-clientnetwork environment, or as a peer machine in a peer-to-peer (ordistributed) network environment. The machine may be a server computer,a client computer, a personal computer (PC), a tablet PC, a set-top box(SIB), a Personal Digital Assistant (PDA), a cellular telephone, a webappliance, a network router, switch or bridge, or any machine capable ofexecuting a set of instructions (sequential or otherwise) that specifyactions to be taken by that machine. Further, while only a singlemachine is illustrated, the term “machine” shall also be taken toinclude any collection of machines that individually or jointly executea set (or multiple sets) of instructions to perform any one or more ofthe methodologies discussed herein.

The example computer system 1100 includes a processor 1102 (e.g., acentral processing unit (CPU) a graphics processing unit (GPU) or both),a main memory 1104 and a static memory 1106, which communicate with eachother via a bus 1108. The computer system 1100 may further include avideo display unit 1110 (e.g., a liquid crystal display (LCD) or acathode ray tube (CRT)). The computer system 1100 also includes analphanumeric input device 1112 (e.g., a keyboard), a cursor controldevice 1114 (e.g., a mouse), a disk drive unit 1116, a signal generationdevice 1118 (e.g., a speaker) and a network interface device 1120.

The disk drive unit 1116 includes a machine-readable medium 1122 onwhich is stored one or more sets of instructions (e.g., software 1124)embodying any one or more of the methodologies or functions describedherein. The software 1124 may also reside, completely or at leastpartially, within the main memory 1104 and/or within the processor 1102during execution thereof by the computer system 1100, the main memory1104 and the processor 1102 also constituting machine-readable media.

The software 1124 may further be transmitted or received over a network1126 via the network interface device 1120.

While the machine-readable medium 1122 is shown in an example embodimentto be a single medium, the term “machine-readable medium” should betaken to include a single medium or multiple media (e.g., a centralizedor distributed database, and/or associated caches and servers) thatstore the one or more sets of instructions. The term “machine-readablemedium” shall also be taken to include any medium that is capable ofstoring, encoding or carrying a set of instructions for execution by themachine and that cause the machine to perform any one or more of themethodologies of the present invention. The term “machine-readablemedium” shall accordingly be taken to include, but not be limited to,solid-state memories, optical and magnetic media, and carrier wavesignals.

Thus, a method and system to organize a data set in groups with multiplepasses of different organizing techniques and to automatically name thegroups have been described. Although the present invention has beendescribed with reference to specific example embodiments, it will beevident that various modifications and changes may be made to theseembodiments without departing from the broader spirit and scope of theinvention. Accordingly, the specification and drawings are to beregarded in an illustrative rather than a restrictive sense.

The Abstract of the Disclosure is provided to comply with 37 C.F.R.§1.72(b), requiring an abstract that will allow the reader to quicklyascertain the nature of the technical disclosure. It is submitted withthe understanding that it will not be used to interpret or limit thescope or meaning of the claims. In addition, in the foregoing DetailedDescription, it can be seen that various features are grouped togetherin a single embodiment for the purpose of streamlining the disclosure.This method of disclosure is not to be interpreted as reflecting anintention that the claimed embodiments require more features than areexpressly recited in each claim. Rather, as the following claimsreflect, inventive subject matter lies in less than all features of asingle disclosed embodiment. Thus the following claims are herebyincorporated into the Detailed Description, with each claim standing onits own as a separate embodiment.

1. A method comprising: identifying a first cluster of data items amonga plurality of data items in response to a query, each of the pluralityof data items including an attribute able to have one of a plurality ofvalues of the attribute; subdividing the first cluster of data itemsinto a second cluster of data items and a third cluster of data items,the subdividing of the first cluster being performed by a processor of amachine and based on a common value of the attribute, the common valuebeing present in each data item within the second cluster and absentfrom each data item within the third cluster; storing the second clusterof data items as corresponding to the common value of the attribute,each data item within the second cluster representing one of a pluralityof items; and naming the second cluster based on a property shared by amajority of the second cluster.
 2. The method of claim 1, wherein: theidentifying of the first cluster of data items includes: performing astatistical analysis of the plurality of data items; and clustering atleast some of the plurality of data items based on the statisticalanalysis to form the first cluster of data items.
 3. The method of claim2, wherein: the performing of the statistical analysis includesprocessing information selected from a group consisting of: theattribute included in each of the plurality of data items, a number ofattributes included in at least some of the plurality of data items, adescription of an item represented by one of the plurality of dataitems, an amount of text in the description of the item, a taxonomy ofthe item, a probability that the item co-occurs with a further itemrepresented by another one of the plurality of data items, and userbehavior data.
 4. The method of claim 2, wherein: the performing of thestatistical analysis is based on a technique selected from a groupconsisting of: k-means clustering, fuzzy clustering, and agglomerativeclustering.
 5. The method of claim 2, wherein: the performing of thestatistical analysis includes identifying a further cluster of dataitems among the plurality of data items, the common value of theattribute being present in each data item within the further cluster. 6.The method of claim 2, wherein: the performing of the statisticalanalysis includes identifying a further cluster of data items among theplurality of data items, the common value of the attribute being absentfrom each data item within the further cluster.
 7. The method of claim1, wherein: each data item within the second cluster represents one of aplurality of items; and the method further comprises determining aproperty shared by a majority of data items within the second cluster ofdata items.
 8. The method of claim 7, wherein: the determining of theproperty is based on the common value of the attribute present in eachdata item within the second cluster.
 9. The method of claim 7, wherein:the determining of the property is based on a further attribute presentin a majority of data items within the second cluster.
 10. The method ofclaim 9, wherein: the determining of the property includes identifying afurther value of the further attribute included in a majority of dataitems within the second cluster.
 11. The method of claim 10, wherein:the identifying of the further value is based on a data structure thatincludes at least one of a synonym of the further value or a termsimilar to the further value.
 12. The method of claim 1, wherein: thenaming of the second cluster is based on the common value of theattribute present in each data item within the second cluster.
 13. Asystem comprising: a navigation module configured to access a pluralityof data items, each of the plurality of data items including anattribute able to have one of a plurality of values of the attribute;and a processor configured by an organizing module communicativelycoupled to the navigation module, the organizing module configured to:identify a first cluster of data items among the plurality of data itemsin response to a query; subdivide the first cluster of data items into asecond cluster of data items and a third cluster of data items, thesubdividing of the first cluster being based on a common value of theattribute, the common value being present in each data item within thesecond cluster and absent from each data item within the third cluster;store the second cluster of data items as corresponding to the commonvalue of the attribute, each data item within the second clusterrepresenting one of a plurality of items; and name the second clusterbased on a property shared by a majority of the second cluster.
 14. Thesystem of claim 13, wherein the organizing module is further configuredto: identify the first cluster of data items by performing a statisticalanalysis of the plurality of data items; and cluster at least some ofthe plurality of data items based on the statistical analysis to formthe first cluster of data items.
 15. The system of claim 13, wherein:each data item within the second cluster represents one of a pluralityof items; and the organizing module is further configured to determine aproperty shared by a majority of data items within the second cluster ofdata items.
 16. The system of claim 15, wherein the organizing module isfurther configured to: determine the property based on the common valueof the attribute present in each data item within the second cluster.17. The system of claim 15, wherein the organizing module is furtherconfigured to: determine the property based on a further attributepresent in a majority of data items within the second cluster.
 18. Anon-transitory machine-readable storage medium comprising instructionsthat, when executed by one or more processors of a machine, cause themachine to perform operations comprising: identifying a first cluster ofdata items among a plurality of data items in response to a query, eachof the plurality of data items including an attribute able to have oneof a plurality of values of the attribute; subdividing the first clusterof data items into a second cluster of data items and a third cluster ofdata items, the subdividing of the first cluster being based on a commonvalue of the attribute, the common value being present in each data itemwithin the second cluster and absent from each data item within thethird cluster; storing the second cluster of data items as correspondingto the common value of the attribute, each data item within the secondcluster representing one of a plurality of items; and naming the secondcluster based on a property shared by a majority of the second cluster.19. A system comprising: means for accessing a plurality of data items,each of the plurality of data items including an attribute able to haveone of a plurality of values of the attribute; and means for:identifying a first cluster of data items among the plurality of dataitems in response to a query; subdividing the first cluster of dataitems into a second cluster of data items and a third cluster of dataitems, the subdividing of the first cluster being based on a commonvalue of the attribute, the common value being present in each data itemwithin the second cluster and absent from each data item within thethird cluster; storing the second cluster of data items as correspondingto the common value of the attribute, each data item within the secondcluster representing one of a plurality of items; and naming the secondcluster based on a property shared by a majority of the second cluster.